Page 1 of 2

2023 Conference article Open Access

(Semi)automated disambiguation of scholarly repositories
Baglioni M., Mannocci A., Pavone G., De Bonis M., Manghi P.
The full exploitation of scholarly repositories is pivotal in modern Open Science, and scholarly repository registries are kingpins in enabling researchers and research infrastructures to list and search for suitable repositories. However, since multiple registries exist, repository managers are keen on registering multiple times the repositories they manage to maximise their traction and visibility across different research communities, disciplines, and applications. These multiple registrations ultimately lead to information fragmentation and redundancy on the one hand and, on the other, force registries' users to juggle multiple registries, profiles and identifiers describing the same repository. Such problems are known to registries, which claim equivalence between repository profiles whenever possible by cross-referencing their identifiers across different registries. However, as we will see, this "claim set" is far from complete and, therefore, many replicas slip under the radar, possibly creating problems downstream. In this work, we combine such claims to create duplicate sets and extend them with the results of an automated clustering algorithm run over repository metadata descriptions. Then we manually validate our results to produce an "as accurate as possible" de-duplicated dataset of scholarly repositories.Source: IRCDL 2023 - 19th conference on Information and Research Science Connecting to Digital and Library Science, pp. 47–59, Bari, Italy, 23-24/02/2023
Project(s): OpenAIRE Nexus via OpenAIRE

See at: ceur-ws.org Open Access | ISTI Repository | CNR ExploRA

2023 Journal article Open Access

Graph-based methods for author name disambiguation: a survey
De Bonis M., Falchi F., Manghi P.
Scholarly knowledge graphs (SKG) are knowledge graphs representing research-related information, powering discovery and statistics about research impact and trends. Author name disambiguation (AND) is required to produce high-quality SKGs, as a disambiguated set of authors is fundamental to ensure a coherent view of researchers' activity. Various issues, such as homonymy, scarcity of contextual information, and cardinality of the SKG, make simple name string matching insufficient or computationally complex. Many AND deep learning methods have been developed, and interesting surveys exist in the literature, comparing the approaches in terms of techniques, complexity, performance, etc. However, none of them specifically addresses AND methods in the context of SKGs, where the entity-relationship structure can be exploited. In this paper, we discuss recent graph-based methods for AND, define a framework through which such methods can be confronted, and catalog the most popular datasets and benchmarks used to test such methods. Finally, we outline possible directions for future work on this topic.Source: PeerJ Computer Science 9 (2023). doi:10.7717/peerj-cs.1536
DOI: 10.7717/peerj-cs.1536
Project(s): EOSC Future via OpenAIRE

, OpenAIRE Nexus via OpenAIRE

Metrics:

See at: PeerJ Computer Science Open Access | ISTI Repository | peerj.com | CNR ExploRA

2023 Conference article Open Access

A graph neural network approach for evaluating correctness of groups of duplicates
De Bonis M., Minutella F., Falchi F., Manghi P.
Unlabeled entity deduplication is a relevant task already studied in the recent literature. Most methods can be traced back to the following workflow: entity blocking phase, in-block pairwise comparisons between entities to draw similarity relations, closure of the resulting meshes to create groups of duplicate entities, and merging group entities to remove disambiguation. Such methods are effective but still not good enough whenever a very low false positive rate is required. In this paper, we present an approach for evaluating the correctness of "groups of duplicates", which can be used to measure the group's accuracy hence its likelihood of false-positiveness. Our novel approach is based on a Graph Neural Network that exploits and combines the concept of Graph Attention and Long Short Term Memory (LSTM). The accuracy of the proposed approach is verified in the context of Author Name Disambiguation applied to a curated dataset obtained as a subset of the OpenAIRE Graph that includes PubMed publications with at least one ORCID identifier.Source: TPDL 2023 - 27th International Conference on Theory and Practice of Digital Libraries, pp. 207–219, Zadar, Croatia, 26-29/09/2023
DOI: 10.1007/978-3-031-43849-3_18
Project(s): OpenAIRE Nexus via OpenAIRE

Metrics:

See at: doi.org Open Access | link.springer.com | ISTI Repository | CNR ExploRA

2023 Report Unknown

InfraScience research activity report 2023
Artini M., Assante M., Atzori C., Baglioni M., Bardi A., Bosio C., Bove P., Calanducci A., Candela L., Casini G., Castelli D., Cirillo R., Coro G., De Bonis M., Debole F., Dell'Amico A., Frosini L., Ibrahim A. S. T., La Bruzzo S., Lelii L., Manghi P., Mangiacrapa F., Mangione D., Mannocci A., Molinaro E., Pagano P., Panichi G., Paratore M. T., Pavone G., Piccioli T., Sinibaldi F., Straccia U., Vannini G. L.
InfraScience is a research group of the National Research Council of Italy - Institute of Information Science and Technologies (CNR - ISTI) based in Pisa, Italy. This report documents the research activity performed by this group in 2023 to highlight the major results. In particular, the InfraScience group engaged in research challenges characterising Data Infrastructures, e-Science, and Intelligent Systems. The group activity is pursued by closely connecting research and development and by promoting and supporting open science. In fact, the group is leading the development of two large scale infrastructures for Open Science, i.e. D4Science and OpenAIRE. During 2023 InfraScience members contributed to the publishing of several papers, to the research and development activities of several research projects (primarily funded by EU), to the organization of conferences and training events, to several working groups and task forces.Source: ISTI Annual Reports, 2023
DOI: 10.32079/isti-ar-2023/002
Project(s): Blue Cloud via OpenAIRE

, EOSC Future via OpenAIRE

, TAILOR

Metrics:

See at: CNR ExploRA

2022 Conference article Open Access

Towards unsupervised machine learning approaches for knowledge graphs
Minutella F., Falchi F., Manghi P., De Bonis M., Messina N.
Nowadays, a lot of data is in the form of Knowledge Graphs aiming at representing information as a set of nodes and relationships between them. This paper proposes an efficient framework to create informative embeddings for node classification on large knowledge graphs. Such embeddings capture how a particular node of the graph interacts with his neighborhood and indicate if it is either isolated or part of a bigger clique. Since a homogeneous graph is necessary to perform this kind of analysis, the framework exploits the metapath approach to split the heterogeneous graph into multiple homogeneous graphs. The proposed pipeline includes an unsupervised attentive neural network to merge different metapaths and produce node embeddings suitable for classification. Preliminary experiments on the IMDb dataset demonstrate the validity of the proposed approach, which can defeat current state-of-the-art unsupervised methods.Source: IRCDL 2022 - 18th Italian Research Conference on Digital Libraries, Padua, Italy, 24-25/02/2022
Project(s): OpenAIRE Nexus via OpenAIRE

See at: ceur-ws.org Open Access | ISTI Repository | CNR ExploRA

2022 Conference article Open Access

A preliminary assessment of the article deduplication algorithm used for the OpenAIRE Research Graph
Vichos K., De Bonis M., Kanellos I., Chatzopoulos S., Atzori C., Manola N., Manghi P., Vergoulis T.
In recent years, a large number of Scholarly Knowledge Graphs (SKGs) have been introduced in the literature. The communities behind these graphs strive to gather, clean, and integrate scholarly metadata from various sources to produce clean and easy-to-process knowledge graphs. In this context, a very important task of the respective cleaning and integration workflows is deduplication. In this paper, we briefly describe and evaluate the accuracy of the deduplication algorithm used for the OpenAIRE Research Graph. Our experiments show that the algorithm has an adequate performance producing a small number of false positives and an even smaller number of false negatives.Source: IRCDL 2022 - 18th Italian Research Conference on Digital Libraries, Padua, Italy, 24-25/02/2022
Project(s): OpenAIRE Nexus via OpenAIRE

See at: ceur-ws.org Open Access | ISTI Repository | CNR ExploRA

2022 Report Open Access

InfraScience research activity report 2021
Artini M., Assante M., Atzori C., Baglioni M., Bardi A., Bove P., Candela L., Casini G., Castelli D., Cirillo R., Coro G., De Bonis M., Debole F., Dell'Amico A., Frosini L., La Bruzzo S., Lazzeri E., Lelii L., Manghi P., Mangiacrapa F., Mangione D., Mannocci A., Ottonello E., Pagano P., Panichi G., Pavone G., Piccioli T., Sinibaldi F., Straccia U.
InfraScience is a research group of the National Research Council of Italy - Institute of Information Science and Technologies (CNR - ISTI) based in Pisa, Italy. This report documents the research activity performed by this group in 2021 to highlight the major results. In particular, the InfraScience group confronted with research challenges characterising Data Infrastructures, eScience, and Intelligent Systems. The group activity is pursued by closely connecting research and development and by promoting and supporting open science. In fact, the group is leading the development of two large scale infrastructures for Open Science, i.e. D4Science and OpenAIRE. During 2021 InfraScience members contributed to the publishing of 25 papers, to the research and development activities of 18 research projects (15 funded by EU), to the organization of conferences and training events, to several working groups and task forces.Source: ISTI Annual report, 2022
DOI: 10.32079/isti-ar-2022/001
Project(s): ARIADNEplus via OpenAIRE

, Blue Cloud via OpenAIRE

, PerformFISH via OpenAIRE

, EOSC-Pillar via OpenAIRE

, DESIRA

, EOSC Future via OpenAIRE

, EOSCsecretariat.eu via OpenAIRE

, EcoScope via OpenAIRE

, RISIS 2

, OpenAIRE-Advance via OpenAIRE

, OpenAIRE Nexus via OpenAIRE

, SoBigData-PlusPlus via OpenAIRE

Metrics:

See at: ISTI Repository Open Access | CNR ExploRA

2022 Journal article Open Access

FDup: a framework for general-purpose and efficient entity deduplication of record collections
De Bonis M., Manghi P., Atzori C.
Deduplication is a technique aiming at identifying and resolving duplicate metadata records in a collection. This article describes FDup (Flat Collections Deduper), a general-purpose software framework supporting a complete deduplication workflow to manage big data record collections: metadata record data model definition, identification of candidate duplicates, identification of duplicates. FDup brings two main innovations: first, it delivers a full deduplication framework in a single easy-to-use software package based on Apache Spark Hadoop framework, where developers can customize the optimal and parallel workflow steps of blocking, sliding windows, and similarity matching function via an intuitive configuration file; second, it introduces a novel approach to improve performance, beyond the known techniques of "blocking" and "sliding window", by introducing a smart similarity matching function T-match. T-match is engineered as a decision tree that drives the comparisons of the fields of two records as branches of predicates and allows for successful or unsuccessful early-exit strategies. The efficacy of the approach is proved by experiments performed over big data collections of metadata records in the OpenAIRE Research Graph, a known open access knowledge base in Scholarly communication.Source: PeerJ Computer Science 8 (2022). doi:10.7717/PEERJ-CS.1058
DOI: 10.7717/peerj-cs.1058
Project(s): OpenAIRE Nexus via OpenAIRE

Metrics:

See at: OpenAIRE Open Access | ISTI Repository | peerj.com | CNR ExploRA

2022 Software Unknown

dnet-dedup framework
Artini M., Atzori C., Bardi A., Baglioni M., De Bonis M., Dell'Amico A., La Bruzzo S. F., Mannocci A., Manghi P.
The GDup Software enables an integrated, scalable, general-purpose system for entity deduplication over big information graphs. GDup supports practitioners with the functionalities needed to realize a fully-fledged entity deduplication workflow over a generic input graph, including Ground Truth support, end-user feedback, and strategies for identifying and merging duplicates to obtain an output disambiguated graph. GDup is today one of the core components of the OpenAIRE infrastructure production system, monitoring Open Science trends on behalf of the European Commission.Project(s): OpenAIRE-Advance via OpenAIRE

, OpenAIRE Nexus via OpenAIRE

See at: github.com | CNR ExploRA

2022 Report Open Access

Data model description of the OpenAIRE Research Graph
La Bruzzo S. F., Artini M., Atzori C., Bardi A., Baglioni M., De Bonis M., Mannocci A., Manghi P., Pavone G.
The OpenAIRE Graph (formerly known as the OpenAIRE Research Graph) is one of the largest open scholarly record collections worldwide, key to fostering Open Science and establishing its practices in daily research activities. Conceived as a public and transparent good, populated out of data sources trusted by scientists, the Graph aims at bringing discovery, monitoring, and assessment of science back into the hands of the scientific community. Imagine a vast collection of research products all linked together, contextualized, and openly available. For the past years, OpenAIRE has been working to gather this valuable record. It is a massive collection of metadata and links between scientific products such as articles, datasets, software, and other research products, entities like organizations, funders, funding streams, projects, communities, and data sources. This technical Report describes the public data model adopted by the OpenAIRE Graph.Source: ISTI Technical Report, ISTI-2022-TR/031, 2022
DOI: 10.32079/isti-tr-2022/031
Metrics:

See at: ISTI Repository Open Access | CNR ExploRA

2022 Report Open Access

OpenAIRE Research Graph: aggregation workflow
La Bruzzo S. F., Artini M., Atzori C., Bardi A., Baglioni M., De Bonis M., Dell'Amico A., Mannocci A., Manghi P., Pavone G.
The OpenAIRE Graph (formerly the OpenAIRE Research Graph) is one of the largest open scholarly record collections worldwide. It is key in fostering Open Science and establishing its practices in daily research activities. Conceived as a public and transparent good, populated out of data sources trusted by scientists, the Graph aims at bringing discovery, monitoring, and assessment of science back into the hands of the scientific community. OpenAIRE collects metadata records from more than 70K scholarly communication sources worldwide, including Open Access institutional repositories, data archives, and journals. All the metadata records (i.e., descriptions of research products) are put together in a data lake with records from Crossref, Unpaywall, ORCID, ROR, and information about projects provided by national and international funders. This technical Report describes the main Aggregation Workflow to orchestrate the data aggregation and the implemented mapping from some of the main datasources into the OpenAIRE research graph data model.Source: ISTI Technical Report, ISTI-2022-TR/033, 2022
DOI: 10.32079/isti-tr-2022/033
Project(s): OpenAIRE-Advance via OpenAIRE

, OpenAIRE Nexus via OpenAIRE

Metrics:

See at: ISTI Repository Open Access | CNR ExploRA

2022 Report Open Access

OpenAIRE Research Graph deduplication workflow
La Bruzzo S. F., Artini M., Atzori C., Bardi A., Baglioni M., De Bonis M., Mannocci A., Manghi P., Pavone G.
The OpenAIRE aggregation workflow can collect metadata records from different providers about the same scholarly work. Each metadata record can carry different information because, for example, some providers are not aware of links to projects, keywords, or other details. Another typical case is when OpenAIRE collects one metadata record from a repository about a pre-print and another from a journal about the published article. To provide correct statistics, OpenAIRE must identify those cases and "merge" the two metadata records so that the scholarly work is counted only once in the statistics OpenAIRE produces. This technical Report describes the Deduplication workflow and technique adopted to deduplicate the OpenAIRE Graph.Source: ISTI Technical Report, ISTI-2022-TR/032, 2022
DOI: 10.32079/isti-tr-2022/032
Project(s): OpenAIRE-Connect via OpenAIRE

, OpenAIRE Nexus via OpenAIRE

Metrics:

See at: ISTI Repository Open Access | CNR ExploRA

2022 Report Open Access

OpenOrgs: a tool for the disambiguation of organizations
Artini M., La Bruzzo S. F., De Bonis M., Pavone G.
Organizations appear all over the Research & Innovation ecosystem in different shapes and formats: the same organization may appear with different metadata fields, different names - e.g., full legal name, short or alternative names, acronym. The ambiguity of organizations results in a huge deficiency in the exchange of information, the findability of research products, the monitoring of activities, and ultimately building a linked open scholarly communication system. OpenOrgs combines an automated process and human curation to compensate for the lack of information available and improve the organization's discoverability.Source: ISTI Technical Report, ISTI-2022-TR/034, 2022
DOI: 10.32079/isti-tr-2022/034
Project(s): OpenAIRE-Advance via OpenAIRE

, OpenAIRE Nexus via OpenAIRE

Metrics:

See at: ISTI Repository Open Access | CNR ExploRA

2022 Report Open Access

InfraScience research activity report 2022
Artini M., Assante M., Atzori C., Baglioni M., Bardi A., Bove P., Candela L., Casini G., Castelli D., Cirillo R., Coro G., De Bonis M., Debole F., Dell'Amico A., Frosini L., La Bruzzo S., Lelii L., Manghi P., Mangiacrapa F., Mangione D., Mannocci A., Ottonello E., Pagano P., Panichi G., Pavone G., Piccioli T., Sinibaldi F., Straccia U., Zoppi F.
InfraScience is a research group of the National Research Council of Italy - Institute of Information Science and Technologies (CNR - ISTI) based in Pisa, Italy. This report documents the research activity performed by this group in 2022 to highlight the major results. In particular, the InfraScience group confronted with research challenges characterising Data Infrastructures, e-Science, and Intelligent Systems. The group activity is pursued by closely connecting research and development and by promoting and supporting open science. In fact, the group is leading the development of two large scale infrastructures for Open Science, i.e. D4Science and OpenAIRE. During 2022 InfraScience members contributed to the publishing of several papers, to the research and development activities of 18 research projects (15 funded by EU), to the organization of conferences and training events, to several working groups and task forces.Source: ISTI Annual reports, 2022
DOI: 10.32079/isti-ar-2022/004
Project(s): ARIADNEplus via OpenAIRE

, Blue Cloud via OpenAIRE

, EOSC-Pillar via OpenAIRE

, DESIRA

, EOSC Future via OpenAIRE

, RISIS 2

, TAILOR

, SoBigData-PlusPlus via OpenAIRE

Metrics:

See at: ISTI Repository Open Access | CNR ExploRA

2021 Dataset Unknown

OpenAIRE research graph: dumps for research communities and initiatives
Manghi P., Atzori C., Bardi A., Baglioni M., Schirrwagen J., Dimitropoulos H., La Bruzzo S., Foufoulas I., Lohden A., Backer A., Mannocci A., Horst M., Czerniak A., Kiatropoulou K., Kokogiannaki A., De Bonis M., Artini M., Ottonello E., Lempesis A., Ioannidis A., Summan F.
This dataset contains dumps of the OpenAIRE Research Graph containing metadata records relevant for the research communities and initiatives collaborating with OpenAIRE. Each dataset is a tar file containing gzip files with one json per line. Each json is compliant to the schema available at DOI: 10.5281/zenodo.3974226DOI: 10.5281/zenodo.3974604
Project(s): RISIS 2 via OpenAIRE

, BE OPEN

, OpenAIRE-Advance via OpenAIRE

Metrics:

See at: CNR ExploRA

2021 Dataset Unknown

OpenAIRE Covid-19 publications, datasets, software and projects metadata
Bardi A., Kuchma I., Pavone G., Artini M., Atzori C., Backer A., Baglioni M., Czerniak A., De Bonis M., Dimitropoulos H., Foufoulas I., Horst M., Iatropoulou K., Jacewicz P., Kokogiannaki A., La Bruzzo S., Lazzeri E., Lohden A., Manghi P., Mannocci A., Manola N., Ottonello E., Schirrwagen J.
This dump provides access to the metadata records of publications, research data, software and projects that may be relevant to the Corona Virus Disease (COVID-19) fight. The dump contains records of the OpenAIRE COVID-19 Gateway (https://covid-19.openaire.eu/), identified via full-text mining and inference techniques applied to the OpenAIRE Research Graph (https://explore.openaire.eu/). The Graph is one of the largest Open Access collections of metadata records and links between publications, datasets, software, projects, funders, and organizations, aggregating 12,000+ scientific data sources world-wide, among which the Covid-19 data sources Zenodo COVID-19 Community, WHO (World Health Organization), BIP! FInder for COVID-19, Protein Data Bank, Dimensions, scienceOpen, and RSNA. The dump consists of a gzip file containing one json per line. Each json is compliant to the schema available at https://doi.org/10.5281/zenodo.3974226DOI: 10.5281/zenodo.3980490
Project(s): OpenAIRE-Advance via OpenAIRE

Metrics:

See at: CNR ExploRA

2020 Journal article Open Access

Entity deduplication in big data graphs for scholarly communication
Manghi P., Atzori C., De Bonis M., Bardi A.
Purpose: Several online services offer functionalities to access information from "big research graphs" (e.g. Google Scholar, OpenAIRE, Microsoft Academic Graph), which correlate scholarly/scientific communication entities such as publications, authors, datasets, organizations, projects, funders, etc. Depending on the target users, access can vary from search and browse content to the consumption of statistics for monitoring and provision of feedback. Such graphs are populated over time as aggregations of multiple sources and therefore suffer from major entity-duplication problems. Although deduplication of graphs is a known and actual problem, existing solutions are dedicated to specific scenarios, operate on flat collections, local topology-drive challenges and cannot therefore be re-used in other contexts. Design/methodology/approach: This work presents GDup, an integrated, scalable, general-purpose system that can be customized to address deduplication over arbitrary large information graphs. The paper presents its high-level architecture, its implementation as a service used within the OpenAIRE infrastructure system and reports numbers of real-case experiments. Findings: GDup provides the functionalities required to deliver a fully-fledged entity deduplication workflow over a generic input graph. The system offers out-of-the-box Ground Truth management, acquisition of feedback from data curators and algorithms for identifying and merging duplicates, to obtain an output disambiguated graph. Originality/value: To our knowledge GDup is the only system in the literature that offers an integrated and general-purpose solution for the deduplication graphs, while targeting big data scalability issues. GDup is today one of the key modules of the OpenAIRE infrastructure production system, which monitors Open Science trends on behalf of the European Commission, National funders and institutions.Source: Data technologies and applications 54 (2020): 409–435. doi:10.1108/DTA-09-2019-0163
DOI: 10.1108/dta-09-2019-0163
Project(s): OpenAIRE2020 via OpenAIRE

, OpenAIRE-Advance via OpenAIRE

Metrics:

See at: Data Technologies and Applications Open Access | ISTI Repository | www.emerald.com | Data Technologies and Applications | CNR ExploRA

2019 Report Open Access

The OpenAIRE research graph: third-party publishing APIs
Atzori C., Baglioni M., Bardi A., Manghi P., La Bruzzo S., De Bonis M., Dell'Amico A., Artini M., Mannocci A., Ottonello E.
This work describes the specification of the OpenAIRE publishing APIs that support third-party services at publishing metadata about interlinked and packaged research products into the OpenAIRE Research Graph, in respect of the OpenAIRE interoperability guidelines (https://guidelines.openaire.eu). Research products generated by researchers using services of research infrastructures are today manually published by researchers in a repository external to their research infrastructure. This phase is often considered an extra burden, because researchers have to fill in metadata forms with information that is already available in the scope of the services they used. By using the OpenAIRE publishing APIs, services of research infrastructures can implement an on-demand publishing workflow for any type of research products to support their researchers at improving the FAIRness of their research products and relief them from the tedious step of finding a suitable repository and manually depositing the products in it.Source: ISTI Technical reports, 2019

See at: ISTI Repository Open Access | CNR ExploRA

2019 Conference article Open Access

The OpenAIRE Research Community Dashboard: On Blending Scientific Workflows and Scientific Publishing
Baglioni M., Bardi A., Kokogiannaki A., Manghi P., Iatropoulou K., Principe P., Vieira A., Nielsen L. H., Dimitropoulos H., Foufoulas I., Manola N., Atzori C., La Bruzzo S., Lazzeri E., Artini M., De Bonis M., Dell'Amico A.
Despite the hype, the effective implementation of Open Science is hindered by several cultural and technical barriers. Researchers embraced digital science, use "digital laboratories" (e.g. research infrastructures, thematic services) to conduct their research and publish research data, but practices and tools are still far from achieving the expectations of transparency and reproducibility of Open Science. The places where science is performed and the places where science is published are still regarded as different realms. Publishing is still a post-experimental, tedious, manual process, too often limited to articles, in some contexts semantically linked to datasets, rarely to software, generally disregarding digital representations of experiments. In this work we present the OpenAIRE Research Community Dashboard (RCD), designed to overcome some of these barriers for a given research community, minimizing the technical efforts and without renouncing any of the community services or practices. The RCD flanks digital laboratories of research communities with scholarly communication tools for discovering and publishing interlinked scientific products such as literature, datasets, and software. The benefits of the RCD are show-cased by means of two real-case scenarios: the European Marine Science community and the European Plate Observing System (EPOS) research infrastructure.Source: 23rd International Conference on Theory and Practice of Digital Libraries, TPDL, pp. 56–69, Oslo, Norway, September 9-12, 2019
DOI: 10.1007/978-3-030-30760-8_5
DOI: 10.5281/zenodo.3467104
DOI: 10.5281/zenodo.3467103
Project(s): OpenAIRE-Connect via OpenAIRE

, OpenAIRE-Advance via OpenAIRE

Metrics:

2019 Dataset Unknown

OpenAIRE Research Graph Dump
Manghi P., Atzori C., Bardi A., Schirrwagen J., Dimitropoulos H., La Bruzzo S., Foufoulas I., Loehden A., Baecker A., Mannocci A., Horst M., Baglioni M., Czerniak A., Kiatropoulou K., Kokogiannaki A., De Bonis M., Artini M., Ottonello E., Lempesis A., Nielsen L. H., Ioannidis A., Bigarella C., Summan F.
The OpenAIRE Research Graph is one of the largest open scholarly record collections worldwide, key in fostering Open Science and establishing its practices in the daily research activities. Conceived as a public and transparent good, populated out of data sources trusted by scientists, the Graph aims at bringing discovery, monitoring, and assessment of science back in the hands of the scientific community. Imagine a vast collection of research products all linked together, contextualised and openly available. For the past ten years OpenAIRE has been working to gather this valuable record. OpenAIRE is pleased to announce the beta release of its Research Graph, a massive collection of metadata and links between scientific products such as articles, datasets, software, and other research products, entities like organisations, funders, funding streams, projects, communities, and data sources. As of today, the OpenAIRE Research Graph aggregates around 450Mi metadata records with links collecting from 10,000 data sources trusted by scientists, including repositories registered in OpenDOAR, Open Access journals registered in DOAJ, Crossref, Unpaywall, ORCID and Microsoft Academic Graph. After cleaning, deduplication, and fine-grained classification processes, they narrow down to ~100Mi publications, ~8Mi datasets, ~200K software research products, 8Mi other products linked together with semantic relations. More than 10Mi full-texts of Open Access publications are mined by algorithms to enrich metadata records with additional properties and links among research products, funders, projects, communities, and organizations. Thanks to the mining algorithm, the graph is completed with 480Mi semantic relations. The OpenAIRE Research graph is available via our BETA Explore Portal and you can download it from Zenodo.DOI: 10.5281/zenodo.3516918
Project(s): OpenAIRE-Advance via OpenAIRE

Metrics:

See at: CNR ExploRA